32 research outputs found
MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation
We present a novel method for exemplar-based image translation, called
matching interleaved diffusion models (MIDMs). Most existing methods for this
task were formulated as GAN-based matching-then-generation framework. However,
in this framework, matching errors induced by the difficulty of semantic
matching across cross-domain, e.g., sketch and photo, can be easily propagated
to the generation step, which in turn leads to degenerated results. Motivated
by the recent success of diffusion models overcoming the shortcomings of GANs,
we incorporate the diffusion models to overcome these limitations.
Specifically, we formulate a diffusion-based matching-and-generation framework
that interleaves cross-domain matching and diffusion steps in the latent space
by iteratively feeding the intermediate warp into the noising process and
denoising it to generate a translated image. In addition, to improve the
reliability of the diffusion process, we design a confidence-aware process
using cycle-consistency to consider only confident regions during translation.
Experimental results show that our MIDMs generate more plausible images than
state-of-the-art methods
Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency
We present an end-to-end joint training framework that explicitly models
6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular
camera setup without supervision. Our technical contributions are three-fold.
First, we highlight the fundamental difference between inverse and forward
projection while modeling the individual motion of each rigid object, and
propose a geometrically correct projection pipeline using a neural forward
projection module. Second, we design a unified instance-aware photometric and
geometric consistency loss that holistically imposes self-supervisory signals
for every background and object region. Lastly, we introduce a general-purpose
auto-annotation scheme using any off-the-shelf instance segmentation and
optical flow models to produce video instance segmentation maps that will be
utilized as input to our training pipeline. These proposed elements are
validated in a detailed ablation study. Through extensive experiments conducted
on the KITTI and Cityscapes dataset, our framework is shown to outperform the
state-of-the-art depth and motion estimation methods. Our code, dataset, and
models are available at https://github.com/SeokjuLee/Insta-DM .Comment: Accepted to AAAI 2021. Code/dataset/models are available at
https://github.com/SeokjuLee/Insta-DM. arXiv admin note: substantial text
overlap with arXiv:1912.0935
ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation
Open compound domain adaptation (OCDA) considers the target domain as the
compound of multiple unknown homogeneous subdomains. The goal of OCDA is to
minimize the domain gap between the labeled source domain and the unlabeled
compound target domain, which benefits the model generalization to the unseen
domains. Current OCDA for semantic segmentation methods adopt manual domain
separation and employ a single model to simultaneously adapt to all the target
subdomains. However, adapting to a target subdomain might hinder the model from
adapting to other dissimilar target subdomains, which leads to limited
performance. In this work, we introduce a multi-teacher framework with
bidirectional photometric mixing to separately adapt to every target subdomain.
First, we present an automatic domain separation to find the optimal number of
subdomains. On this basis, we propose a multi-teacher framework in which each
teacher model uses bidirectional photometric mixing to adapt to one target
subdomain. Furthermore, we conduct an adaptive distillation to learn a student
model and apply consistency regularization to improve the student
generalization. Experimental results on benchmark datasets show the efficacy of
the proposed approach for both the compound domain and the open domains against
existing state-of-the-art approaches.Comment: Accepted to ECCV 202
DiffFace: Diffusion-based Face Swapping with Facial Guidance
In this paper, we propose a diffusion-based face swapping framework for the
first time, called DiffFace, composed of training ID conditional DDPM, sampling
with facial guidance, and a target-preserving blending. In specific, in the
training process, the ID conditional DDPM is trained to generate face images
with the desired identity. In the sampling process, we use the off-the-shelf
facial expert models to make the model transfer source identity while
preserving target attributes faithfully. During this process, to preserve the
background of the target image and obtain the desired face swapping result, we
additionally propose a target-preserving blending strategy. It helps our model
to keep the attributes of the target face from noise while transferring the
source facial identity. In addition, without any re-training, our model can
flexibly apply additional facial guidance and adaptively control the
ID-attributes trade-off to achieve the desired results. To the best of our
knowledge, this is the first approach that applies the diffusion model in face
swapping task. Compared with previous GAN-based approaches, by taking advantage
of the diffusion model for the face swapping task, DiffFace achieves better
benefits such as training stability, high fidelity, diversity of the samples,
and controllability. Extensive experiments show that our DiffFace is comparable
or superior to the state-of-the-art methods on several standard face swapping
benchmarks.Comment: Project Page: https://hxngiee.github.io/DiffFac